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CONCENTRATION OF NORMS AND EIGENVALUES OF RANDOM 

MATRICES 

MARK W. MECKES 


Abstract. We prove concentration results for operator norms of rectangular random 
matrices and eigenvalues of self-adjoint random matrices. The random matrices we consider 
have bounded entries which are independent, up to a possible self-adjointness constraint. 
Our results are based on an isoperimetric inequality for product spaces due to Talagrand. 


1. Introduction 

In this paper we prove concentration results for norms of rectangular random matrices 
acting as operators between £” spaces, and eigenvalues of self-adjoint random matrices. 
Except for the self-adjointness condition when we consider eigenvalues, the only assumptions 
on the distribution of the matrix entries are independence and boundedness. Our approach is 
based on a powerful isoperimetric inequality for product probability spaces due to Talagrand 

0 . 

Throughout this paper X = Xm,n will stand for an m x n random matrix with real or 
complex entries Xjk- (Specihc technical conditions on the XjkS will be introduced as needed 
for each result below.) If 1 < p, g < oo and A is an m x n matrix, we denote by || A||p^g the 
operator norm of A : We denote by p' = p/{p — 1) the conjugate exponent of p. 

For a real random variable Y we denote by EU the expected value and by MU any median 
of Y. Our hrst main result is the following. 

Theorem 1. Let 1 < p < 2 < q < oo. Suppose the entries xjk of X are independent complex 
random variables, each supported in a set of diameter at most D. Then 
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for all t > 0, where r = min{p', q}. 

To prove Theorem |I] we show that Talagrand’s isoperimetric inequality, which at hrst 
appears adapted primarily to prove normal concentration for functions which are Lipschitz 
with respect to a Euclidean norm, actually implies sometimes stronger concentration for 
functions which are Lipschitz with respect to more general norms. In particular, as we show 
in Corollary ^ below, one obtains concentration of the kind in (2) for convex functions which 
are Lipschitz with respect to U norms for r > 2. Since such functions are automatically 
Lipschitz with respect to the Euclidean norm, one can apply the known r = 2 case of this 
fact directly, but would then obtain the upper bound with r replaced by 2 in the r.h.s. of 
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(0). Since the conclusion of Theorem is trivial when t/D < 1, the estimate (|^) is stronger 
than the estimate one would obtain this way. 

To put Theorem Q in perspective, we consider the particular case in which p = q', m = n, 
and F[xjk = 1] = ^[xjk = —1] = 1/2 for all j, k. In this situation, 

where C > 0 is a universal constant. Theorem ^implies that, while achieves values 

as large as it is comparable to its median except on a set whose probability decays 
exponentially quickly as n ^ oo. Furthermore, in this situation the estimate in (|^ is sharp 
as long as is sufficiently large and is sufficiently small. These observations apply 

in more general situations; see the remarks in Section following the proof of Theorem |l|. 

If A is a self-adjoint n x n matrix, we denote by Ai(y4) > A2(v4) > ■■ ■ > A„(74) the 
eigenvalues of A, counted with multiplicity. Our second main result is the following. 

Theorem 2. Suppose m = n and the entries Xjk, 1 < j < A; < n, of X are independent 
complex random variables such that: 

(i) for 1 < j < n, Xjj is real and is supported on an interval of length at most \/2D; 
and 

(ii) for I < j < k < n, Xjk is either supported on a set of diameter at most D; or 
Xjk = Wjkiojk + ifdjk), where Wjk E C is a constant with \wjk\ < 1, and ajk, fdjk are 
independent real random variables each supported in intervals of length at most D; 

and that Xjk = xjk for k < j. Then 

(2) P[|Ai(A') 

for all t > 0, and the same holds ■ 

2 < k < n — 1, there exists an Mk G 

(3) P[|Afc(X) - Mk\ >t]< 8exp 

for all t > 0, and the same upper bound holds if Xk{X) is replaced by \n-k+i{X) and Mk by 
Xdn—k+l ■ 

Note that Theorem ^ applies in particular to the case of real symmetric random matrices 
with off-diagonal entries supported in intervals of length D and diagonal entries supported 
in intervals of length y/2D. 

The proof of Theorem is also based on Talagrand’s theorem, in this case applying it 
only to functions which are Lipschitz with respect to a Euclidean norm. Theorem is (up 
to numerical constants) a sharpening and generalization of a result of Alon, Krivelevich, and 
Vu [Q. Their proof is also based on Talagrand’s theorem, although they apply it in a very 
different way. For perspective, we note that in the particular case in which "^[xjk = 1] = 
^[xjk = —1] = 1/2, MAi(X) is of the order -y/n, while Ai(X) can achieve values as large as 
n. Furthermore, in this situation the estimate in (Q) is sharp when n~^l‘^t is sufficiently large 
and n~^t is sufficiently small. (See for a discussion this point when X is the adjacency 
matrix of the random graph G(?t., 1/2).) However, the estimate in (j^) is probably not sharp 


MAi(X)| >t]< 


Ai(X) is replaced by A„(X). Furthermore, for each 
R such that 


t 


2V2{y/k + y/k^)D 


< 8 exp 
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in its dependence on k. See the remarks in Section ^ following the proof of Theorem ^ for 
details. 

We note that aside from the uniform boundedness assumption, the distributions of the 
independent entries of X in Theorems and ^ are completely arbitrary. In particular there 
is no assumption of identical distribution of independent entries, and no assumption about 
the values of their means. 

We emphasize that our results are of interest as bounds for large deviations. Beginning 
with the work of Tracy and Widom [^, ^ , which has been rehned and extended in |^, |T^, || , 
it is known that, in typical situations, the kind of result contained in (|^), while nontrivial, 
is not sharp when t is of smaller order than y/n. More precisely, it has been established that 
typically, one has concentration of the largest eigenvalue of the form 

(4) P[Ai(X) — EAi(W) > f] < Cexp — max C 2 

in the normalization used here, where C, Ci, C 2 > 0 are constants. 

Talagrand’s theorem was hrst applied in the context of random matrices by Guionnet and 
Zeitouni [§], who used it to prove a concentration result for the spectral measure of self- 
adjoint random matrices, and who also remarked that the same methods give concentration 
results for other functionals of self-adjoint matrices. For general discussions of applications 
of concentration of measure phenomena to random matrices, see the survey 0 by Davidson 
and Szarek and Section 8.5 of the book by Ledoux. 

In Section we show how to obtain concentration for Lipschitz functions on iq sum 
spaces (and more general sums of normed spaces) from Talagrand’s isoperimetric inequality. 
In Section ^ we prove Theorems Q and and give an inhnite-dimensional version of Theorem 
1^ and a version of Theorem ^ for singular values of rectangular matrices. We also compare 
the results obtained by our methods with the corresponding results for Gaussian random 
matrices obtained from the Gaussian isoperimetric inequality. 

2. General concentration results 

We hrst need some notation. Let (Gi, Si, pi),..., (Gat, Sat,/iat) be probability spaces, 
G = Gi X ■ • ■ X Gat, P = /ii ® ® fiN- For x = (xi,..., xn) G G, 1 / = {yi ,..., Vn) G D, 

h{x, y) G is dehned by 

For A C D, X G D, Ua{x) = {h(x, ?/) : ?/ G A} C Finally, we dehne the convex hull 
distance from x to A by 

(5) fc{Ax) = inf{|z| : z e com/U a{x)}, 

where | ■ | is the standard Euclidean norm and conv denotes the convex hull. Talagrand’s 
isoperimetric inequality is the following. 

Theorem 3 (Talagrand [^]). Let (G,P) be a product probability space as above. For any 

Acn, 


exp 
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which by Chebyshev’s inequality implies 

P({a: : > t}) < 

for all t > 0. 


As in [^, we have ignored measurability issues in the statement of Theorem |^. To be 
strictly correct, the integrals and probabilities which appear must be replaced by upper 
integrals and outer probabilities; however, this issue is irrelevant in applications, since one 
typically applies such a result to estimate expressions in which all the functions and sets 
which appear are measurable. 

Let II ■ IIE be a 1-unconditional norm on by which we mean that the standard basis 
of R” is a 1-unconditional basis for || ■ He (see [|^; such a norm is also sometimes called 
absolute). For normed vector spaces (V^-, || • \\vj), j = we denote by 

the direct sum of vector spaces with the norm 

II(ui,... ,nAr)||y^ = IKlInillvi,..., ll^^vllvjv; He- 

Theorems and ^ will be proved using the following consequence of Theorem | 



Corollary 4. Let V be the Iq sum of the normed vector spaces (Vj, || • ||y^.), j = 1,... ,N 
for q > 2; that is, V = Vin in the notation above. For j = 1,..., N, let p,j be a probability 
measure on Vj which is supported on a compact set of diameter at most 1. Let P = p-i ® ® 

/i„. Suppose F : V —>■ M. is 1-Lipschitz and quasiconvex, that is, cx), a]) is convex for 

all a G R. Then 

(6) P[|F-MF| >f] 

for allt>0. 


We will postpone the proof of Corollary ^ until after some remarks. The q = 2 case of 
Corollary H has been widely noted and applied in various degrees of generality already; see 
p0|, 12, ^1 and their references. As observed in the introduction, if F is 1-Lipschitz with 

respect to the iq sum norm on V, then F is also 1-Lipschitz with respect to the £2 sum norm 
on V. However, applying the q = 2 case of Corollary ^ directly in this situation yields only 
the weaker upper bound 4e“* in the inequality (P). 

Corollary H can be applied in the case that F is L-Lipschitz and each /Xj is supported 
on a set of diameter at most D, by replacing t with t/LD in the r.h.s. of (|^). This fact is 
used implicitly in the proofs in Section ^ The conclusion of Corollary ^ also holds when 
F is replaced by —F, so that Proposition also applies when F is quasiconcave, that is, 
when F“^([a, cxd)) is convex for all a G R. In particular. Corollary ^ applies to both convex 
and concave Lipschitz functions. Talagrand gives an example which shows that some form 
of convexity assumption in Corollary 0 cannot be removed in general in M], in which the 


special case of Theorem ^ for the uniform measure on the discrete cube was hrst proved. 
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The conclusion of Corollary ^ does not hold in general for functions which are only Lips- 
chitz with respect to an £p sum norm for 1 < p < 2 without the introduction of dimension 
dependent constants, even if the bound is replaced by any other dimension indepen¬ 

dent function which approaches 0 at inhnity. To see this, let {Yj : j G N} be independent 
random variables with ^[Yj = 0] = ^[Yj = 1] = 1/2 for each j, and S'„ = Yl'j=i 

bitrary 1 < n < iV. Then Sn^ = ||(hi,..., f4)||p, and is a median for S]!^. Suppose 

we have a concentration result which implies there exists a function / with lim^^oo /(^) = 0 
such that for all n and for all t > 0, 

(7) P [Sl/P - >t\< f{t). 


Then by Taylor’s theorem applied at f = 0, 
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Sn - nl2 

y/^/2 


> t 


= p 

= p 
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1/p' 


cl/p 

^ irt 


n \ i/p 
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2/ - 2VPp 


UP + O { rip ^ 


UP + O (up 


2VPp 


which implies that for any f > 0, 


Sn - n/2 

n—>oo L V^/2 


lim P 


> t 


= 0 , 


which contradicts the central limit theorem. Since concentration of the kind in the inequality 
(0) about any value implies concentration about a median (with a possibly different function 
/), no such concentration result holds for Sn^ when 1 < p < 2. 

It is also not difficult to see that for the examples of with q > 2, the concentration 
result of Corollary ^ is sharp, up to the values of numerical constants, when c = c{q) < 
< 1 — 2~^/^. Moreover, by gluing together copies of for different values of n, 
one obtains an example of a Lipschitz function for which Corollary ^ is sharp for the entire 
nontrivial range of t. 

It is more typical to state results of the type in Corollary ^ in terms of deviations of a 
random variable from the mean rather than the median. This difference is inessential, since 
this level of concentration implies that the median and mean cannot be too far apart. For 
example, in the situation of Corollary ^ we have 

poo 

|EF-MF| < E|F-MF| = / P[|F - MF| > f]df 

Jo 

poo 

<4 / = 4^+ir(l + i). 

Jo ^ 

We now turn to the proof of Corollary Rather than prove Corollary ^ directly from 
Theorem we will deduce it as a special case of Proposition below, which uses Theorem 
^ to derive concentration for functions which are Lipschitz with respect to an arbitrary 1- 
unconditional norm, in terms of a kind of modulus function for the norm. Theorem bounds 
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the size of the set of points which are far from a set A in terms of the convex hull distance 
fc(A, ■) from A. Thus it provides concentration for functions which satisfy a Lipschitz type 
condition with respect to the convex hull distance. However, since in general this distance 
is not induced by a metric, some care is needed in its application. 

Let II ■ 11^ be a 1-unconditional norm on as above. We define 

KEit) = inf{|a:| : ||a:||^; > t, ||a:||oo < l}, 
where we use the convention that inf 0 = cx). 


Proposition 5. Let Ve be as described before Corollary and let Ke be as above. For 
j = 1,... ,N, let fij be a probability measure on Vj which is supported on a compact set of 
diameter at most 1. Let ^ = pii® ■■■ ® pin- Suppose F : 14; —*■ IR 1-Lipschitz and that F 
is quasiconvex. Then 

( 8 ) P[|F - MF| > f] < 4exp 

for all t > 0. 

It is easy to verify that K^N{t) > CC for q >2 and any N (c.f. Lemma ^ below), so 
that Corollary ^ follows immediately from Proposition |^. 




Proof of Proposition [^. First we show that 

(9) KE[dist{x,conv A)) < fc{A, x) 

for X = (xi,... ,xn) G supp(P) and ^ ^ A C Ve., where dist is the distance in the normed 
space 14 . Let y^ = {y\, ..., y%) G 4 fl supp(P) and 0 < 6*^ < 1 for fc = 1,..., n such that 
^fc=i = 1- Then for each j = 1,..., iV, 


n 


X 


fc=i 


<^0k\\{xj-y^j)\\v. <^0kh{x,y’^)^ 
Vi ^=1 


k=l 


since Xj,y^ G supp(p.j) for each j, k. Then by unconditionality. 


dist(x, conv 4) < 

n 

Okv'" 

< 

n 


k=l 

Ve 

k=l 


and so 


iLE(dist(a:, conv 4)) < 


y^^9kh{x, 


y 


k=l 


The inequality (P) now follows since the r.h.s. of is precisely the infimum of this last 
expression over all such finite sequences 1 /^, 6 *^, k = 1,... ,n. Therefore, by Theorem for 
any 4 C I 4 , 

[x : iF£;(dist(x, conv4)) > f}) < 
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for all t > 0. Thus if F is quasiconvex and 1-Lipschitz on Ve, we have that for any a G M, 
t > 0, 


P[F < a]P[F > a + t] < P[F < a]P ({x : iP£;(dist(a:, F ^((—cx), a])) > KE{t)}) 


< exp 




Applying this in turn with a = MF and a = M.F — t, we get 


P[F - MF >t]<2 exp 




for every f > 0. 


P[F - MF < -f] < 2 exp 


-\{KE{t)f 


□ 


In order to apply Proposition one needs to estimate the function Ke- This is of 
most interest if one can bound Ke^ uniformly for some family of spaces Ej for which 
supj dim(Fj) = oo. This is not difficult to do for certain classes of spaces. For a non¬ 
increasing sequence w = {wi,W 2 , ■ ■ ■) of positive numbers and p > 1, the iV-dimensional 
Lorentz space is with the norm 

/ AT 
\j=l 



where {aj : 1 < j < N} is the nonincreasing rearrangement of {|xj| : 1 < j < N}. For an 
Orlicz function ip, that is, a convex nondecreasing function ip : M+ such that t/’(0) = 0 

and lim^^oo 'P’it) = oo, the iV-dimensional Orlicz space is with the norm 


N 

\x\\^ = inf < p > 0 : ''^^^p 
i=i 



^ 1 

<1 

V p . 

> 1 


Observe that £p = i^^p if wj = 1 for j = 1,..., iV, and ip = i^ if 'ip{t) = F, p> 1. For these 
two classes of spaces, we have the following elementary estimates, which we state without 
proof. 


Lemma 6. If p > 1 and w & ir for some r such that max{l, 2/p} < r' < oo, then 

If pj is any Orlicz function, then 


K^n (t) > inf 


u 


0<n<l ^p){u/f) 

In particular, K^N(t) > for q>2. 
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Note that the estimates in Lemma may be trivial and are not necessarily optimal, but 
when they are nontrivial, they are valid in all dimensions. By considering vectors x G {0,1}'^, 
one can see that the estimate K^Nit) > for g > 2 is sharp for t = ^ fc = 1 ,..., iV. 

Observe that the proof of Proposition | actually gives separate tail estimates for deviations 
of F above and below its median; the same is therefore true of Corollary ^ as well. The full 
generality of Proposition ^ can in fact be derived with some amount of argument from 
the (known) q = 2 case of Corollary |^, using these bounds separately; however, we hnd 
it simpler to argue directly from the isoperimetric inequality of Theorem ^ as above. One 
could alternatively prove Corollary ^ by proving an iq version of Theorem by dehning 
an iq convex hull distance fq{A,x) = inf{||2;||q : 2; G convf/^(a;)} and mimicking the proof 
of Theorem |^; or as a corollary to the more general and abstract Theorem 4.2.4 in 
However, this approach would result only in a slight sharpening of the constant 1/4 which 
appears in the exponent. 

We remark that to use Proposition ^ to full advantage for non-Euclidean norms, one must 
use a nonlinear lower bound on Ke and make use of the restriction ||a;||oo < 1- If, for example, 
one uses only the fact that ||a;||q < |x| for all x when q > 2, then one is using no more than 
the fact that a function which is 1-Lipschitz with respect to the iq norm is 1-Lipschitz with 
respect to the £2 norm, which, as we have observed already in the introduction, leads to a 
weaker concentration result. 


It is instructive to compare the general concentration results above and the applications in 
the next section with the corresponding results for Gaussian measures. We begin by recalling 
the functional form of the Gaussian isoperimetric inequality, due independently to Borell |Q 
and Sudakov and Tsirel’son [^. Let 'Jn be the standard Gaussian measure on dehned 
by d 7 Ar(x) = where | ■ I is again the standard Euclidean norm. 

Theorem 7 (Borell, Sudakov-Tsirel’son). Let F : > M 6e 1-Lipschitz with respect to 

the Euclidean metric on . Then 


7 Ar({a; : F{x) > MF -Lt}) <1- 7 i((-cx),f]) < -e 


-t^/2 


for all t > 0. 


Observe that by composing F with an affine contraction, one obtains the same conclusion 
in Theorem if the standard Gaussian measure 77 V is replaced by the product of one¬ 
dimensional Gaussian measures with arbitrary means and variances at most 1. Thus the 
q = 2 case of Gorollary ^ provides a level of concentration for quasiconvex Lipschitz func¬ 
tions of independent bounded random variables comparable to the concentration of Lipschitz 
functions of independent Gaussian random variables with bounded variances. 

A similar concentration principle is obeyed by any probability measure which satishes a 
logarithmic Sobolev inequality (see 0)- Specihcally, if /i is a probability measure on 
which has logarithmic Sobolev constant at most 1, and F : M is 1-Lipschitz with 

respect to the Euclidean metric on , then 

fi{{x : F{x) > EF + t}) < 


for all t > 0. Since logarithmic Sobolev inequalities tensorize, one obtains concentration 
for Lipschitz functions of independent random variables whose distributions have uniformly 
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bounded logarithmic Sobolev constants. In particular, whenever we state concentration 
results below for random matrices with Gaussian entries, similar results hold under the 
weaker assumption of entries with uniformly bounded logarithmic Sobolev constants. We 
remark that Guionnet and Zeitouni |p also proved a concentration result for the spectral 
measure in the case that the matrix entries satisfy a logarithmic Sobolev inequality. 


3. Norms and eigenvalues of random matrices 


Since any norm on a real or complex vector space is a convex function. Proposition ^ 
can be applied directly to obtain concentration of norms of a random matrix X; all that is 
necessary is to estimate the function Ke, or the Lipschitz constant of the given norm with 
respect to one for which a bound on Xg is known. Note that by the triangle inequality, 

I ||a:|| - Ill/ll I < ||a; - y\\ 

for any norm, which implies that to estimate the Lipschitz constant of one norm with respect 
to another norm, it suffices to estimate the appropriate equivalence constant. 


Proof of Theorem |^. For an m x n matrix A, let Aj G C” denote the row of A. Then 
Holder’s inequality implies 

(10) ||H||p^g < II (||Hi||p/, . . . , ||Hm||p') llg < ll(®ifc)||r, 

where {ajk) represents the matrix A thought of as an element of C™”, and we recall that 
r = min{p',g}. The claim follows by using this estimate and taking = C for each j in 
Gorollary |[ Alternatively, the inequality (|T0|) and Lemma ^ imply that 

where iP) is identified with C™” via the standard bases, so that the claim follows from 
Proposition o. □ 


We remark that Theorem |I| can be extended to more general norms on 971^ ,i(C) by using 
Proposition together with estimates on the corresponding function Ke- In particular, as 
long as one has the appropriate Lipschitz estimates, the underlying normed spaces need not 
be unconditional, nor must the norm on matrices even be an operator norm. 

Now for comparison, we let G = Gmn be an m x n random matrix whose entries are 
independent Gaussian random matrices with arbitrary means and variances at most 1. For 
l<p<2<q< oo, ||A||p^g < ||A ||2 for any m x n matrix A, where i|A ||2 is the Hilbert- 
Schmidt norm of A. Then Theorem implies that 


P 


||G||p^,-M||G||, 


.A>t 


< e 


-F/2 


for all f > 0. Observe that this is comparable to what one would obtain in the cases of 
independent bounded entries by using only the q = 2 case of Gorollary 

Theorem ^ implies that the order of fluctuations of ||Xm,n|| about its median is 0(1), 
independent of m and n. In typical situations, the median itself grows without bound as m 
or n does. Suppose for example that E|a:jfc| > c > 0 for all j, k. (In the situation of Theorem 
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1^, this will be the case if each Xjk is real, \xjk\ < 1, ^Xjk = 0, and Xjk has variance at least 
c.) Then 

E||X||p^, > E||Xei||, > m'/''-iE||Xei||i 

m 

= ^^E|xji| > cmL'!'^. 

i=i 

Since ||X||p^q = ||X*||q/^p/, we obtain E||X||p^g > cniax{m^/'^, As remarked earlier, 

M||X||p^g will also have at least this order when the hypotheses of Theorem | are satished. 

A similar npper estimate is possible in the case p = q'. Suppose that each Xjk is a 
symmetric real random variable such that \xjk\ < 1. We note hrst that by the Riesz convexity 
theorem, 

iiwii,,^p<iiAiiL2ii^ii;i<ii^iiL- 

By the contraction principle (see Theorem 4.4]), 

E||X||2^2<E||X||2^2, 




where X = Xm,,n is an m x n matrix whose entries are independent Rademacher (Bernoulli) 
random variables; that is, Flxjk = 1] = T‘[xjk = —1] = 1/2 for all j,k. By standard 
comparisons between Rademacher and Gaussian averages and Chevet’s inequality ||] (see 
also ITSl), 

where G > 0 is an absolute numerical constant. Therefore in this situation, 

(The argument above is entirely standard and the estimate is probably known, although we 
could not hnd a reference in the literature.) 

The example of X above can be used to show that the estimate in Theorem |l| is sharp for 
large enough values of t up to numerical constants in the case that p = q'. For 1 < a < m, 
1 < 6 < n, 

P|]||X||g/^q > > P|]X has an a x 6 all-1 submatrix] > 2““^, 

so that 

P[l|X||,.^, > i] > 2-'" 

for t = (afe)^/'^, a = 1,... ,m, b = Together with the above upper bound on 

E|| A||q/^g, this implies that in this situation, the concentration result of Theorem is sharp 
when (max{m, is sufficiently large, up to the values of numerical constants. 

For p, q in other ranges, one can derive concentration for ||X||p^g by comparing the or 
norm to the £2 norm of the appropriate dimension. In this case one will obtain concentration 
on a scale which depends on m or n. For example, in the situation of Theorem one has 


P 


IIA'II,^, - MIIA'II, 


.,1 >t 


< 4 exp 


- 4m 9 


^-1 


TIP 


r-l 


ifl<g<2<p<cx). 
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Since the conclusion of Theorem is independent of dimension, one can derive the following 
inhnite dimensional version for kernel operators from tp to tq. 

Corollary 8. Let 1 < p <2 < q < oo, and let Cjk > 0, j, k eN, be constants such that 

q/p'\ 1/9 

j=l \k=l 


< oo. 


Suppose that Xjk, j, A; G N are independent complex random variables each supported in a 
set of diameter at most D, such that \xjk\ < Cjk for all j,k . Define the random operator 
X : £p —>■ iq by setting X{ej) = ^jk^k- Then 

p[| ||X|| -M||; 

for all t > 0, where ||X|| is the operator norm of X and r = min{p', q}. 




1 


IV 

< 4 exp 

~4 

t)J 


We remark that when p = q', the l.h.s. of ([TT|) was shown by Persson to coincide 
with both the g-summing norm 'Xq{T) and the g-nuclear norm i'q{T) of the kernel operator 
T : iq> ^ iq given by T(ej) = ^jkGk- 

Proof of Corollary^. The fact that \xjk\ < Cjk implies that ||X|| < oo always. Apply The¬ 
orem to the n X n upper-left corner of the inhnite matrix {xjk), and use ([TI|) and the 
estimate \xjk\ < Cjk to pass to the limit n ^ oo. □ 

Note that by taking Cjk = 0 when j > m or k > n in Corollary ^ we recover Theorem |l|, 
so that these two statements are formally equivalent. 

We now specialize to the case in which m = n and consider X as an operator on £ 2 , so that 
we use only the q = 2 case of Corollary ^ Guionnet and Zeitouni g were the hrst to note 
that this concentration theorem implies normal concentration for any function on matrices 
(or self-adjoint matrices) which is convex and Lipschitz with respect to the Hilbert-Schmidt 
norm. For example, we have the following. Let the entries Xjk of X all be independent, and 
satisfying the condition (||) in the statement of Theorem and for simplicity let D = 1. For 
1 < p < cx), we denote by ||A||p the Schatten p-norm of an n x n matrix A (see, e.g., |^). 
Then for alH > 0, 


for 2 < p < 00 , and 


P 


P 


1X1 


||X||p-M||X||p 1 >f 




M||X||p| >t 


< 4 exp 


^-1 


. 4np~ 

for 1 < p < 2. (In particular, we observe that when p = q = 2, the conclusion of Theorem 
1^ holds when the matrix entries Xjk satisfy condition ([]) in the statement of Theorem |^.) 
Furthermore, since 11 IAN I < ||A||i < x/nllAlU for any unitarily invariant norm ||l ■ III on 
fm„(C) satisfying |||.Fn||| = 1, it follows that 


P 


|||X|||-M|||X||| |>t 
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for alH > 0 for any such norm. Each of these observations is in fact a special case of the tail 
inequalities for norms of sums of independent vector-valued random variables which were 
the original motivation for Talagrand’s development of Theorem ^ and related concentration 
theorems. 

We now consider eigenvalues of a self-adjoint random matrix. Although these are not 
(except in the extreme cases) quasiconvex or quasiconcave functions, Corollary ^ can still be 
used to derive concentration. 


Proof of Theorem For simplicity, we assume D = 1. First observe that 


1/2 


I|A'|| 2 = 

\j,k=l 


0=1 


72 


1/2 


l<_7</c<n 


We suppose for simplicity that each of the upper-diagonal entries Xjk for j < k is supported 
in a set of diameter at most 1. (The argument is similar in the case that for some j < k, 
Xjk = Wjk{ajk + iPjk) as in the statement of the theorem.) Note that j = 1,... ,n, and 
Xjkj i- ^ j < k < n, are independent random variables in M or C, each supported in a set of 
diameter at most 1. ||X ||2 is 72 times the £2 sum norm of the direct sum of n copies of M 
and ( 2 ) copies of C spanned by these variables. 

Recall also that 

l|A|h= (7^‘W 

\fc=l 

which implies that each Xk{X) is a 1-Lipschitz function of X with respect to ||X|| 2 . The first 
claim now follows directly from Corollary ^ with 1^ = C or 1^ = M for each j, since Ai is a 
convex function, and is concave. 

To prove the second claim, we introduce the following functions for a self-adjoint matrix 
A. For fc = 1,..., n, let 

k 

Ft(A) = Y, 

3=^ 



k 

Gk{A) = Y K-j+i{A) = Tr A - F,(A). 
i=i 

Then Fk is positively homogeneous (of degree 1), and Fk{—A) = —Gk{A). From this it 
follows that 

\Fk{A) - Fk{B)\ < max{Ffc(A - B), -Gk{A - B)} < Vk\\A - B^, 
\Gk{A)-Gkm<^\\A-B\\2. 

Moreover, Fk is convex and Gk is concave for each k; this follows from Ky Fan’s maximum 
principle (see, e.g., p|) or Davis’s characterization of all convex unitarily invariant func¬ 
tions of a self-adjoint matrix. Let Mk = — M.Fk-i. Then by Corollary for any 
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0 < 0 < 1 , 


F[\Xk{X)-Mk\>t] =F[\{Fk{X)-MFk{X))-{Fk-i{X)-MFk-i{X))\>t] 

<¥[\Fk{X)-MFk{X)\>et] 

+ P[|Ffc_i(X)-MFfc_i(X)| > {i-e)t] 


< 4 exp 



+ 4 exp 


( (i-e)t \ 

\2^2{k-l)) 


The estimate (§) now follows by letting 9 = \/k/{\/k + \/k — 1). (This is not the optimal 
value of 9, but optimizing at this point would only result in a slight sharpening of the 
constants, and not of the dependence on t or k.) The claim for A„_fc+i(X) follows similarly, 
using Gk{X) in place of Fk{X), or as a formal consequence by replacing X with —X. □ 


Now, for comparison, we let Hn be an n x n random matrix with entries hjk, 1 < J, < n, 
such that: 

(i) the entries hjk, 1 < J < < u are independent Gaussian random variables, 

(ii) the variance of hjk for 1 < j < fc < n is at most 1, 

(iii) the variance of hjj is at most a/2 for 1 < j < n, and 

(iv) hjk = hkj for k < j. 

Then for each 1 < A: < n. Theorem ^ implies that 

P[|Afc(i7„) -MAfc(hf„)| >t\< 

for all f > 0. This is comparable to the result of Theorem Q for Ai(X) and A„(X), but the 
same level of concentration holds for eigenvalues in the bulk of the spectrum, which is not 
the case in Theorem 

The result of Theorem for A/X) and An(X) (stated in less generality) was shown by 
Krivelevich and Vu in . After a preliminary version of this paper was written, we learned 
that Alon, Krivelevich, and Vu showed that for 1 < A: < n, 

P[|Afc(X)-MAfc(X)| >A] <4exp 

for all f > 0, and that the same holds if \k{X) is replaced by An_fc+i(X). The approach 

in [|^ handles the lack of convexity of \k by not using the q = 2 case of Corollary but 

instead applying Theorem ^ by directly estimating the convex hull distances involved. Our 
Theorem ^ improves the order of fluctuations of Xk{X) from 0{k) (as in |l[]) to 0{'/k). It is 
also conjectured in [|l[ that Xk{X) should be concentrated at least as strongly as Ai(X), as 
one obtains from Theorem in the Gaussian case. We emphasize again that we are dealing 
only with large deviations here. As we have already indicated in the introduction, the tail 
estimate (|^) for the extreme eigenvalues is not sharp for t = o{^/n); furthermore, it is likely 
that concentration is even tighter for eigenvalues in the bulk of the spectrum. 

It follows as in the discussion following Corollary ^ that Theorem ^ implies that EAfc(X) 
differs by at most 0{\/k) from the number Mk which appears in the statement of the theorem. 
One can also show that the number Mk which appears in the statement of the theorem differs 


WlM 
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by at most 0{\/k) from MAfc(X). By using the separate bounds for deviations above and 
below the median in the situation of Corollary we have 

|Mfc — MAfc(X)| < 2 a/6 \og2[\fk + \/k — 1)D. 

We can also obtain a similar result to Theorem ^ for singular values in the rectangular case. 
Let I = min{m, n}. For an m x n matrix A, we denote by Si(y4) > S2(v4) > ■ ■ ■ > si{A) > 0 
the singular values of A, counted with multiplicity; that is, Sk{A) = Afc((y4*y4)^/^). 


Theorem 9. Suppose the entries Xjk of X are independent complex random variables, each 
satisfying the condition in the statement of Theorem |^. Then 

P[|si(X) - Msi(X)| >t\< 

for all t > 0. Furthermore, for each 2 < k < min{m, n}, there exists an G M such that 
P[|sfc(X) - Mk\ >i\< 8exp 
for all t > 0. 


2{^/k + 


< 8 exp 




IQkD"^ 


The proof is similar to the proof of Theorem |], using in place of the functions Fk the Ky 
Fan fc-norms, dehned by 


k 

ii^ii(fc)=(^) 

i=i 

for 1 < fc < min{m, n}. We remark that the triangle inequality, and hence convexity, for the 
Ky Fan norms can be proved as a formal consequence of the convexity of the functions Fk. 
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